322 research outputs found

    Consistent Estimation of Mixed Memberships with Successive Projections

    Full text link
    This paper considers the parameter estimation problem in Mixed Membership Stochastic Block Model (MMSB), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factorization. The proposed algorithm is provably consistent under MMSB with general conditions on the parameters of the model. SPOC is also shown to perform well experimentally in comparison to other algorithms

    Alternative sampling for variational quantum Monte Carlo

    Full text link
    Expectation values of physical quantities may accurately be obtained by the evaluation of integrals within Many-Body Quantum mechanics, and these multi-dimensional integrals may be estimated using Monte Carlo methods. In a previous publication it has been shown that for the simplest, most commonly applied strategy in continuum Quantum Monte Carlo, the random error in the resulting estimates is not well controlled. At best the Central Limit theorem is valid in its weakest form, and at worst it is invalid and replaced by an alternative Generalised Central Limit theorem and non-Normal random error. In both cases the random error is not controlled. Here we consider a new `residual sampling strategy' that reintroduces the Central Limit Theorem in its strongest form, and provides full control of the random error in estimates. Estimates of the total energy and the variance of the local energy within Variational Monte Carlo are considered in detail, and the approach presented may be generalised to expectation values of other operators, and to other variants of the Quantum Monte Carlo method.Comment: 14 pages, 9 figure

    Comparing spectra of graph shift operator matrices

    Get PDF
    Typically network structures are represented by one of three different graph shift operator matrices: the adjacency matrix and unnormalised and normalised Laplacian matrices. To enable a sensible comparison of their spectral (eigenvalue) properties, an affine transform is first applied to one of them, which preserves eigengaps. Bounds, which depend on the minimum and maximum degree of the network, are given on the resulting eigenvalue differences. The monotonicity of the bounds and the structure of networks are related. Bounds, which again depend on the minimum and maximum degree of the network, are also given for normalised eigengap differences, used in spectral clustering. Results are illustrated on the karate dataset and a stochastic block model. If the degree extreme difference is large, different choices of graph shift operator matrix may give rise to disparate inference drawn from network analysis; contrariwise, smaller degree extreme difference results in consistent inference

    Large Scale Spectral Clustering Using Approximate Commute Time Embedding

    Full text link
    Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to O(n3)O(n^3) and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

    Graph similarity through entropic manifold alignment

    Get PDF
    In this paper we decouple the problem of measuring graph similarity into two sequential steps. The first step is the linearization of the quadratic assignment problem (QAP) in a low-dimensional space, given by the embedding trick. The second step is the evaluation of an information-theoretic distributional measure, which relies on deformable manifold alignment. The proposed measure is a normalized conditional entropy, which induces a positive definite kernel when symmetrized. We use bypass entropy estimation methods to compute an approximation of the normalized conditional entropy. Our approach, which is purely topological (i.e., it does not rely on node or edge attributes although it can potentially accommodate them as additional sources of information) is competitive with state-of-the-art graph matching algorithms as sources of correspondence-based graph similarity, but its complexity is linear instead of cubic (although the complexity of the similarity measure is quadratic). We also determine that the best embedding strategy for graph similarity is provided by commute time embedding, and we conjecture that this is related to its inversibility property, since the inverse of the embeddings obtained using our method can be used as a generative sampler of graph structure.The work of the first and third authors was supported by the projects TIN2012-32839 and TIN2015-69077-P of the Spanish Government. The work of the second author was supported by a Royal Society Wolfson Research Merit Award

    A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

    Get PDF
    This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr

    On the Interplay between Strong Regularity and Graph Densification

    Get PDF
    In this paper we analyze the practical implications of Szemerédi’s regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. In addition, this robustness can be enforced by graph densification

    Learning an atlas of a cognitive process in its functional geometry

    Get PDF
    Proceedings of the 22nd International Conference, IPMI 2011, Kloster Irsee, Germany, July 3-8, 2011.In this paper we construct an atlas that captures functional characteristics of a cognitive process from a population of individuals. The functional connectivity is encoded in a low-dimensional embedding space derived from a diffusion process on a graph that represents correlations of fMRI time courses. The atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. The atlas is not directly coupled to the anatomical space, and can represent functional networks that are variable in their spatial distribution. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects.National Science Foundation (U.S.) (IIS/CRCNS 0904625)National Science Foundation (U.S.) (CAREER grant 0642971)National Institutes of Health (U.S.) (NCRR NAC P41- RR13218)National Institute of Biomedical Imaging and Bioengineering (U.S.) (U54-EB005149)National Institutes of Health (U.S.) (U41RR019703)National Institutes of Health (U.S.) (P01CA067165)Seventh Framework Programme (European Commission) (nâ—¦257528 (KHRESMOI)

    Mathematical Analysis of Copy Number Variation in a DNA Sample Using Digital PCR on a Nanofluidic Device

    Get PDF
    Copy Number Variations (CNVs) of regions of the human genome have been associated with multiple diseases. We present an algorithm which is mathematically sound and computationally efficient to accurately analyze CNV in a DNA sample utilizing a nanofluidic device, known as the digital array. This numerical algorithm is utilized to compute copy number variation and the associated statistical confidence interval and is based on results from probability theory and statistics. We also provide formulas which can be used as close approximations

    k is the Magic Number -- Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

    Full text link
    Most convex and nonconvex clustering algorithms come with one crucial parameter: the kk in kk-means. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of kk-means, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results. Our \textsc{SpecialK} clustering algorithm automatically determines the appropriate value for kk, without requiring any data transformation or projection, and without assumptions on the data distribution. Additionally, it is capable to decide that the data consists of only a single cluster, which many existing algorithms cannot
    • …
    corecore